Serial No. 10/796,283 
Preliminary Amendment Dated: June 17, 2004 

Amendments to the Specification: 

Page 2, the last full paragraph, please replace with the following: 
A second object of the present invention is to provided provide a noise 
adaptation system of speech model, a noise adaptation method, and a noise 
adaptation program for speech recognition that can provide improved speech 
recognition rates by using the result of the clustering. 

Page 3, line 1, through page 6, line 12, please replace with the following: 
According to claim 1 In one aspect of the present invention, there is 
provided a noise adaptation system of speech model for adapting a speech model 
for any noise to speech to be recognized in a noisy environment, the speech model 
being learned by using clean speech data, the system comprising: clustering 
means for clustering noise-added speech; speech model space generating means 
for generating a tree-structure noisy speech model space based on the result of 
the clustering performed by the clustering means; parameter extracting means 
for extracting a speech feature parameter of input noisy speech to be recognized; 
selecting means for selecting an optimum model from the tree -structure noisy 
speech model space generated by the speech model space generating means; and 
linear transformation means for applying linear transformation to the model 
selected by the selecting means so that the model provides a further increased 
likelihood. Because noise-added speech is consistently used both in the 
clustering process and model learning process, optimal clustering for many type 
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of noise data and an improved accuracy of estimation of speech model sequence 
of input speech can be achieved. 

According to claim 2 an embodiment of the present invention, there is 
provided the noise adaptation system of speech model according to claim 1, 
wherein the clustering means generates the noise-added speech by adding the 
noise to the speech in accordance with a signal-to-noise ratio condition, subtracts 
the mean value of speech cepstral of the generated noise-added speech, generates 
a Gaussian distribution model of each of piocos piece of generated noise-added 
speech, and calculates the likelihood between the pieces of noise-added speech to 
generate a likelihood matrix to provide a clustering result. This allows noise- 
added speech to be clustered. 

According to claim 3 another embodiment of the present invention, there 
is provided the noise adaptation system according to claim 1 or 2, wherein the 
selecting means selects a model that provides the highest likelihood for the 
speech feature parameter extracted by the parameter extracting means. By 
selecting the model that provides the highest likelihood, the accuracy of speech 
recognition can be improved. 

According to claim 4 a further embodiment of the present invention, there 
is provided the noise adaptation system according to claim 3, wherein the 
selecting means selects a model by searching the tree -structure noisy speech 
model space downward from the highest to the lowest level. By searching the 
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tree structure from the highest level to the lowest level , an optimum model can 
be selected. 

According to claim 5 In another embodiment of the present invention, 
there is provided the noise adaptation system according to one of the procoding 
claims, wherein the linear transformation means performs the linear 
transformation on the basis of the model selected by the selecting means to 
increase the likelihood. By performing the linear transformation, the likelihood 
can be maximized. 

According to claim 6 In another aspect of the present invention, there is 
provided a speech model noise adaptation method for adapting a speech model 
for any noise to speech to be recognized in a noisy environment, the speech model 
being learned by using clean speech data, the method comprising: a clustering 
step of clustering noise-added speech; a speech model space generating step of 
generating a tree-structure noisy speech model space based on the result of the 
clustering performed at the clustering step; a parameter extracting step of 
extracting a speech feature parameter of input noisy speech to be recognized; a 
selecting step of selecting an optimum model from the tree-structure noisy 
speech model space generated at the speech model space generating step; and a 
linear transformation step of applying linear transformation to the model 
selected at the selecting step so that the model provides a further increased 
likelihood. Because noise-added speech is consistently used both in clustering 
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and model learning, an improved accuracy of estimation of a speech model 
sequence of input speech can be achieved. 

According to claim 7 a variation of the present invention, there is provided 
a noise adaptation program for speech recognition that controls a computer to 
adapt a speech model for any noise to speech to be recognized in a noisy 
environment, the speech model being learned by using clean speech data, the 
program comprising: a clustering step of clustering noise-added speech; a speech 
model space generating step of generating a tree-structure noisy speech model 
space based on the result of the clustering performed at the clustering step; a 
parameter extracting step of extracting a speech feature parameter of input 
noisy speech to be recognized; a selecting step of selecting an optimum model 
from the tree-structure noisy speech model space generated at the speech model 
space generating step; and a linear transformation step of applying linear 
transformation to the model selected at the selecting step so that the model 
provides a further increased likelihood. Because noise-added speech is 
consistently used both in clustering and model learning, an improved accuracy of 
estimation of a speech model sequence of input speech can be achieved. 

Page 20, line 1, through page 21, line 2, please replace with the following: 
As has been described, according to claims, 1, 6, and 11 of tho present 
invention, the present invention has advantages that, because noise-added 
speech is consistently used both in the clustering and model learning processes, 
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optimal clustering for many types of noise data and an improved accuracy of 
estimation of a speech model sequence for input speech can be achieved. 

According to claim 2of tho prooont invention, tho The present invention 
has an advantage that noise-added speech can be clustered by adding noise to 
the speech in accordance with signal-to-noise ratio conditions, subtracting the 
mean value of speech cepstral of each of the pieces of generated noise-added 
speech, generating a Gaussian distribution model of each of the pieces of noise- 
added speech, and calculating the likelihood between the pieces of noise-added 
speech to generate a likelihood matrix. 

According to claim 3 of tho present invention, tho The present invention 
has an advantage that an improved accuracy of speech recognition can be 
achieved by selecting a model that provides the highest likelihood for an 
extracted speech feature parameter. 

According to claim 4 of tho prooont invention, tho The present invention 
has an advantage that an optimum model can be selected by searching the tree- 
structure noisy speech model from the highest to level for an optimum model. 

According to claim 5 of tho prooont invention, the The present invention 
has an advantage that the likelihood can be maximized by performing linear 
transformation on the basis of the selected model so as to increase the likelihood. 
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